Search CORE

1,133 research outputs found

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

Author: Blanc Guy
Lange Jane
Tan Li-Yang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 17/11/2019
Field of study

Consider the following heuristic for building a decision tree for a function

f : \{0,1\}^n \to \{\pm 1\}

. Place the most influential variable

x_i

f

at the root, and recurse on the subfunctions

f_{x_i=0}

and

f_{x_i=1}

on the left and right subtrees respectively; terminate once the tree is an

\varepsilon

-approximation of

f

. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds:

\circ

Upper bound: For every

f

with decision tree size

s

and every

\varepsilon \in (0,\frac1{2})

, this heuristic builds a decision tree of size at most

s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}

\circ

Lower bound: For every

\varepsilon \in (0,\frac1{2})

and

s \le 2^{\tilde{O}(\sqrt{n})}

, there is an

f

with decision tree size

s

such that this heuristic builds a decision tree of size

s^{\tilde{\Omega}(\log s)}

. We also obtain upper and lower bounds for monotone functions:

s^{O(\sqrt{\log s}/\varepsilon)}

and

s^{\tilde{\Omega}(\sqrt[4]{\log s } )}

respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Learning Stochastic Decision Trees

Author: Blanc Guy
Lange Jane
Tan Li-Yang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers

Author: Blanc Guy
Koch Caleb
Strassle Carmen
Tan Li-Yang
Publication venue
Publication date: 08/07/2023
Field of study

We prove a strong composition theorem for junta complexity and show how such theorems can be used to generically boost the performance of property testers. The

\varepsilon

-approximate junta complexity of a function

f

is the smallest integer

r

such that

f

\varepsilon

-close to a function that depends only on

r

variables. A strong composition theorem states that if

f

has large

\varepsilon

-approximate junta complexity, then

g \circ f

has even larger

\varepsilon'

-approximate junta complexity, even for

\varepsilon' \gg \varepsilon

. We develop a fairly complete understanding of this behavior, proving that the junta complexity of

g \circ f

is characterized by that of

f

along with the multivariate noise sensitivity of

g

. For the important case of symmetric functions

g

, we relate their multivariate noise sensitivity to the simpler and well-studied case of univariate noise sensitivity. We then show how strong composition theorems yield boosting algorithms for property testers: with a strong composition theorem for any class of functions, a large-distance tester for that class is immediately upgraded into one for small distances. Combining our contributions yields a booster for junta testers, and with it new implications for junta testing. This is the first boosting-type result in property testing, and we hope that the connection to composition theorems adds compelling motivation to the study of both topics.Comment: 44 pages, 1 figure, FOCS 202

arXiv.org e-Print Archive

Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Author: Blanc Guy
Lange Jane
Qiao Mingda
Tan Li-Yang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)
Publication date: 01/01/2021
Field of study

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{?(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{?(k)} before achieving high accuracy

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server